CUMULVS: Extending a Generic Steering and Visualization Middleware for Application Fault-Tolerance

نویسندگان

  • Philip M. Papadopoulos
  • James Arthur Kohl
  • Bart D. Semeraro
چکیده

CUMULVS is a middleware library that provides application programmers with a simple API for describing viewable and steerable elds in large-scale distributed simulations. These descriptions provide the data type, a logical name of the eld/parameter, and the mapping of global indices to local indices (processor and physical storage) for distributed data elds. The CUMULVS infrastructure uses these descriptions to allow an arbitrary number of front-end \viewer" programs to dynamically attach to a running simulation, select one or more elds for visualization, and update steerable variables. (Viewer programs can be built using commercial visualization software such as AVS or custom software based on GUI interface builders like Tcl/Tk.) Although these data eld descriptions require a small e ort on the part of the application programmer, the payo is a high degree of exibility for the infrastructure and end-user. This exibility has allowed us to extend the infrastructure to include \application-directed" checkpointing, where the application determines the essential state that must be saved for a restart. This has the advantage that checkpoints can be smaller and made portable across heterogeneous architectures using the semantic description information that can be included in the checkpoint le. Because many technical di culties, such as e cient I/O handling and time-coherency of data, are shared between visualization and checkpointing, it is advantageous to leverage a checkpoint/restart system against a visualization/steering infrastructure. Also, because CUResearch supported by the Applied Mathematical Sciences Research Program of the O ce of Energy Research, U.S. Department of Energy, under contract DE-AC05-96OR22464 with Lockheed Martin Energy Research Corporation 1060-3425/98 $10.0 MULVS \understands" parallel data distributions, efcient parallel checkpointing is achievable with a minimal amount of e ort on the programmer's part. However, application scientists must still determine what makes up the essential state needed for an application restart and provide the proper logic for restarting from a checkpoint versus normal startup. This paper will outline the structure and communication protocols used by CUMULVS for visualization and steering. We will develop the similarities and di erences between userdirected checkpointing and CUMULVS-based visualization. Finally, these concepts will be illustrated using a large synthetic seismic dataset code.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cumulvs: Interacting with High-Performance Scientific Simulations, for Visualization, Steering and Fault Tolerance

High-performance computer simulations are an increasingly popular alternative or complement to physical experiments or prototypes. However, as these simulations grow more massive and complex, it becomes challenging to monitor and control their execution. CUMULVS is a middleware infrastructure for visualizing and steering scientific simulations while they are running. Front-end “viewers” attach ...

متن کامل

Eecient and Flexible Fault Tolerance and Migration of Scientiic Simulations Using Cumulvs

Many practical scientiic computer applications would ben-eet from a simple checkpointing mechanism that provides automatic restart or recovery in response to faults and failures , and enables dynamic load balancing and improved resource utilization using task migration. However, developing applications with such capabilities, especially in distributed, heterogeneous operating environments, is v...

متن کامل

Cumulvs: Providing Fault Toler. Ance, Visualization, and Steer Ing of Parallel Applications

The use of visualization and computational steering can often assist scientists in analyzing large-scale scientiic applications. Fault-tolerance to failures is of great importance when running on a distributed system. However, the details of implementing these features are complex and tedious, leaving many scientists with inadequate development tools. CUMULVS is a library that enables programme...

متن کامل

Integrating CUMULVS into AVS/Express

This paper discusses the development of a CUMULVS interface for runtime data visualization using the AVS/Express commercial visualization environment. The CUMULVS (Collaborative, User Migration, User Library for Visualization and Steering) system, developed at Oak Ridge National Laboratory, is an essential platform for interacting with high-performance scientific simulation programs on-the-fly....

متن کامل

Cumulvs Providing Fault Tolerance Visualization and Steering of Parallel Applications G a Geist Ii James

The use of visualization and computational steering can often assist scientists in analyzing large scale scienti c applications Fault tolerance to failures is of great importance when running on a distributed system However the details of implementing these features are complex and tedious leaving many scientists with inadequate development tools CUMULVS is a library that enables programmers to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998